PyDigger - unearthing stuff about Python


NameVersionSummarydate
minference 0.1.5.post1 To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. 2024-08-13 09:39:09
hourdayweektotal
4412157403283314
Elapsed time: 1.65372s